-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve trans and untrans with AVX512 #117
Conversation
Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com> Co-authored-by: vesslanjin <jun.i.jin@intel.com>
With performance test against AVX2 and AVX512, I test against 4 byte elem, elem size varies from 8-120(incr step 8), |
@jrs65 and @kiyo-masui Could you help check if it is OK for such feature enablement for this repo? |
@jrs65 and @kiyo-masui please help check if missed |
Hi @HackToday. Sorry for the belated response, it's been a busy end to the semester for myself (and Kiyo too I imagine). Thanks for putting this together, it's definitely appreciated. Your code looks good to me, but I need to look around for an AVX512 machine for me to run the tests on as I think Github actions doesn't use any AVX512 supporting hosts. Also, I'm intrigued if you have any benchmarks of this. How much does AVX512 support speed things up? |
hi @jrs65 Thanks for your reply. For AVX512 available system, I tested against with PR changes, to count following
The tests show that total element size varies from 8-120(8, 16, 24, 32 etc. step 8, as Fig1 x label), 4 byte element. Performance speedup ratio can be 0.94x~1.5x,(AVX512 vs AVX2) Please check Fig1. even in some cases not better than AVX2, it could keep nearly same performance. Please let me know if need more info. |
@jrs65 has added one more improvement.(untrans part within bitshuffle), it is same usage like trans with AVX512. Also for 8 byte can have such following improvement. (if with more large size can achieve more speedup ratio, reach to 1.5x) |
@jrs65 and @kiyo-masui in case anything missed. BTW, the workflows CI seems need approval to run. |
Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com>
Hi @HackToday Thanks for all your efforts here, and apologies for the slow responses. I've got the code built and running on one of my own machines (the cluster we use has some AVX512 nodes), and on the machine that you gave me access to elsewhere. Everything seems to run fine, and with a nice speed boost. I'm going to merge your code in now. I'll wait a few weeks to cut a release (mostly as I'm going on vacation) but also so I can see about merging in a few other outstanding PRs. |
Thanks @jrs65 for your time and help for the verification. |
Signed-off-by: Wu, Kaiqiang kaiqiang.wu@intel.com
Co-authored-by: vesslanjin jun.i.jin@intel.com